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A fast algorithm to study one-dimensional self-gravitating systems, and, 
more generally, systems that are Lagrangian integrable between collisions, 
is presented. The algorithm is event-driven, and uses a heap-ordered set 
of predicted future events. In the limit of large number of particles N, the 
operation count is dominated by the cost of reordering the heap after each 
event, which goes asymptotically as log N. Some applications are discussed 
in detail. 



1. INTRODUCTION 

In this paper we discuss a fast algorithm which integrates numerically a one- 
dimensional system of N interacting particles, provided the dynamics can be La- 
grangian integrated between two successive collisions. An important application 
is self-gravitating systems, since the gravitational force is Lagrangian invariant in 
one dimension. Several similar models with Lagrangian invariant or quasi-invariant 
force fields can also be treated. 

By computing all possible collision times between particles, we can select the 
smallest of these and let the particles evolve until this time is reached. The particles 
are then made to collide according to the prescriptions of the dynamics. It is clear 
that in such a scheme the most time-consuming operation is the search of the 
minimal collision time. In one dimension, the number of possible collisions between 
N particles is N — 1, because the set of positions is well-ordered. This seems to 
imply immediately an O(N) operations count for each collision. Indeed, if we order 
the set of collision times, finding the minimum takes 0(1) operations, but inserting 
a new collision time in the list will take O(N) operations. On the other hand, if 
we keep the set unordered, adding a new element will take 0(1) operations, but 
finding the minimum will take 0(N). The essence of an efficient algorithm is to use 
a data structure that simultaneously permits fast insertion and fast search of the 
minimum. This is exactly the aim of the heap structure, well known in algorithmic 
design [6, 7, 16]. Although known since a rather long time, it is only recently that 
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the heap concept has been used in physical problems, like front propagation [17] or 
molecular dynamics simulations of hard-sphere systems [9, 10, 13]. In this paper 
we extend this technique to systems with force fields acting between collisions, 
provided these fields are Lagrangian invariant, or quasi-invariant. 

The paper is organized as follows. In section 2 we introduce the concept of 
a heap [6], and discuss code speed implementation issues. Section 3 shows how 
an efficient event-driven evolution scheme can be implemented by using the heap 
structure. In section 4, we apply this code to the numerical solution of two different 
one-dimensional systems of particles : the free streaming motion, i.e. the evolution 
of TV non- interacting particles, and a classical self-gravitating system [12, 18, 19]. 
This section also contains checks on the speed of the algorithm. In the conclusion 
section, we sum up our results and discuss future applications. 



2. THE HEAP 

The discussion in this section mainly follows [6], the classical references on the 
subject being [7, 16]. 

The heap structure is based on the concept of a tree. A tree is formally defined 
to be a set of nodes, connected by links, such that the path to go from one node to 
another is unique. A rooted tree is a tree with distinguished node, the root, which 
is usually by convention drawn on the top. A rooted tree can be defined recursively 
as follows : a rooted tree consists of a root node, and a finite set of subtrees, which 
are themselves rooted trees. Every node has a unique parent, except the root. The 
children of a node i are the roots of the subtrees, under i. A node which has no 
children is called a leaf, and is usually drawn at the bottom. Binary trees are the 
most commonly used : they consist of a root node and at most two subtrees which 
are themselves binary trees. 

A binary tree of N elements is labeled T(N). The nodes can be graded in 
levels. The children of a node on level / are on level I — 1. The height h is a 
lower bound on the number of levels needed to build a binary tree of N elements, 
with h = [~log 2 -Af| +1, as is illustrated in figure 1. In the following we will use 
complete and left-justified binary trees, which are such that all the leaves are on 
level 2 or 1, and those on level 1 are drawn from the left. The nodes in a complete 
and left-justified binary tree can be numbered from the top to the bottom, and left 
to right, such that the children of node i are respectively 2i and 2i + 1, while its 
parent (if any) is [i/2\ . Such a tree can be represented by an array with the array 
index equal to the node number in the tree. 

In a heap-ordered tree, an element is associated to each node, and the elements 
obey the heap condition : each element in a child node is greater than or equal to 
the elements in its parent node. An array representing a tree satisfying the heap 
condition is called a heap [7, 16]. 

Given N objects, the following strategy can be used to put them in a heap : the 
first element is placed on the top of the tree, which corresponds to the first element 
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FIG. 1. Example of an ordered complete binary tree. 

of the array. The next element is placed in the second position, and after comparison 
eventually replaces the first, if smaller. This operation is called sift-up [16] 1 . 

This procedure is iterated : a new entry is placed at the end of the array, and a 
comparison with its parent is performed. If smaller, the element is exchanged with 
the parent and is again compared to its new parent, and so on. As a result, each 
new element moves on an ascending path on the tree until the heap condition is 
satisfied. 

The above described simple procedure to initially order an array in a heap 
(bottom-up) takes 0(iVlog N) operations at the most. A faster scheme (top-down), 
that only takes O(N) operations in the worst case is however possible [7, 16]. More- 
over, once the heap is built, the operational cost of inserting or substituting a new 
element in a heap is at most O(logiV), the height of the tree, and the selection of 
the minimum is a trivial 0(1) operation. The heap is therefore a well adapted data 
structure for both finding a minimum and replacing elements in an array. 

In many applications, elements have to be sorted according to different criteria. 
It is thus not possible to put them in a single heap, and it would be very inefficient 
to duplicate data in different heaps. The solution to this problem is to use a 
single instance of all elements (that might already be sorted itself according to 
one criterion), and to use indirect heap(s) containing only pointers to them [16]. 



1 The inverse operation (sift-down) makes an element move down through the tree (percolate) 
until the heap condition is restored. This operation is typically needed when the heap has to be 
re-ordered after replacement of an clement. 
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When comparing elements while moving through the heap, we access them through 
the (current) heap pointers and, if the elements don't satisfy the heap condition, 
we simply exchange their pointers without moving the elements themselves. This 
procedure can be very efficient if elements carry lots of related data, but it leads to 
a large amount of memory traffic while moving through the heap, because of the 
need to indirect through the pointers (but see below). 

The concept of trees, and thus of heaps, can be generalized to bases larger than 
two [8]. In a base r tree, each node has at most r subtrees so that, in a complete 
left-justified r-ary tree, the children of node i are ri + 2 — r, . . . , ri + 1 while the 
parent of node i is — 2 + r)/r\ = \(i — l)/r]. It is now clear that we face 
two conflicting requirements : on one hand, we would like to minimize the height 
of the tree h = \og r N by choosing r as large as possible ; on the other hand, 
the work needed at each level of the tree to find the smallest of the children, e.g. 
in a sift-down, increases linearly with r, so we would like to keep it small. Blind 
minimization of the expression r \og r N suggests that the best branching ratio would 
be e, the base of natural logarithms. However, the processing of each level also 
incurs some work independent of r, and it is thus better to choose some higher 
(integer) value. This becomes even more important on modern microprocessors 
that access memory through caches which are filled in bursts of typically 4, 8, or 
16 words. Because the children of a node are stored consecutively in memory, it is 
then possible to use the fetching of a cache line to load all children of one node at 
the same time. This however also requires the heap to be cache- aligned [8], which 
can be realized by fiddling with the base address returned by the memory allocation 
routines in a language like C. On the microprocessors we used (Alpha, Pentium, 
MIPS), we found that base 4 was much better than base 2, while bases 8 and higher 
were slightly slower than base 4. The gain in speed by using aligned base 4 heaps 
with respect to unaligned base 2 ones is significant, about 25% on a Pentium with 
15% coming from the choice of base and 10% from the memory alignment. 

To get all the benefits of aligned large base heaps, the comparison keys have to 
be really present in the heap and not accessed through pointers that would incur 
extra memory loads. We thus implemented semi-indirect heaps in which the keys 
are placed inside of the heap, so they can be compared directly, while pointers to 
the corresponding elements are located in an array parallel to the heap. Because 
pointers have to be exchanged only when doing a swap, this implementation reduces 
nearly by half the number of memory accesses (to at most r + 1 memory loads and 
3 memory writes if a swap occurs in a sift-down) and is faster than other priority 
queue implentations like those decribed in [9]. 

3. THE ALGORITHM 

We consider the motion of N colliding particles in a one-dimensional medium. The 
interaction is not specified at this level : we only require that the equation of motion 
for a particle can be integrated in between two successive collisions. Arrays of size N 
contain the states of the particles, such as position, velocity and acceleration, at the 
time of their last collision, stored in increasing order of the spatial coordinates. An 
additional state variable associated to each particle is tj , the time it last experienced 
a collision. Initially all tj are set to zero. 
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FIG. 2. This figure shows the structure of a semi-indirect heap and the function of the 
two "shuffling" arrays PH [•] and HP [•] . The first array in the figure only contains the predicted 
collision times ordered as a heap, while the second contains the particle states stored in increasing 
order of spatial positions. The two indexing arrays allow to move back and forth between the two 
sets. 



The algorithm starts by computing the collision time of each particle with its 
neighbor to the right, and the results are stored in an array of size N — 1. This 
array is then turned into a heap, T(N — 1). So that we do not need to move 
the whole states of particles while processing the heap, we introduce an indexing 
array, Particle-Heap (P !![•]), mapping the position in the heap to the position in 
space. Referring to figure 2 if index I labels the position of the collision time in 
the heap, then j — PH[l] is the index in space of the leftmost of the two particles 
(j and j + 1) involved in that collision. To update the list of predicted collision 
times of neighbors particles, we also need the index array inverse to Particle-Heap, 
which we call Heap-Particle (HP[»]). Hence for all j in the range 1 to N — 1 

PH[HP\j]]=j and HP[PH\j]]=j. (1) 

This condition will be preserved at all times while we update the heap. Note that 
the collison times are really directly present in the heap, and that the two indexing 
arrays then realize exactly the functions needed to implement the semi-indirect 
heap. 

The initial forming of the heap requires O(N) operations, as stated in the previous 
section. Once the heap has been built, the minimum collision time i m i n is at the 
root. The particles involved in the first collision, which are j = P H [1] and j + 1 , are 
selected, and their states evolved up to i m i n - The two particles are then at the same 
spatial position and their states are rearranged by the collision (momenta simply 
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FIG. 3. Intersection of the trajectories of particles j and j ' + 1 at time t = t m i n . The ringed 
intersections are the collision/crossings that need to be recomputed. 



exchanged in the case of elastic collision). The times tj and Tj+i, are set equal 
to i m in- This procedure conserves the monotonicity of the particle positions both 
at and after the collision. Next the new predicted collision times between j and j + 1 
is computed and replaces the one at the root of the tree. The root might now not 
fulfill the heap condition, so the array probably has to be re-arranged with a sift- 
down of the root value. We need to check whether the new value is still less than the 
values in its children. If this is not the case, it is percolated down the tree until the 
heap condition is satisfied. This procedure involves at most O(logiV) operations, 
as discussed previously. 

Because of the changes of the states of particles j and j + 1 , their collisions with 
their other nearest neighbors, j — 1 and j + 2 need to be re-computed, see figure 3. 
To do this, particles j — 1 and j + 2 are temporarily moved forward in time up 
to i m ; n , where their new collision times with, respectively, particles j and j + 1 are 
computed, and put into the heap, replacing the old ones. As a consequence, the 
heap has to be re-arranged two more times, again at a cost of at most O(logTV) for 
each modification. 

The heap is now again in a consistent state, with the next collision time at the 
root, and the whole procedure can be repeated. The evolution can be stopped cither 
after some fixed number of collisions Z, or when the predicted time for the next 
collision becomes larger than some chosen final time T en< ^. In the end, all particles 
are moved forward in time from their own tj to the final time which is either T onc i 
or the time of the last collision. In conclusion, the complexity of the algorithm is 
in the worst-case 0(Z log N) plus lower-order terms 0{Z) and O(N). 
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FIG. 4. An clastic collision in the (x, t) plane : the solid line traces the motion of particle j, 
while the dashed line shows particle j + 1. 

4. APPLICATIONS 

In this section we discuss two applications which we have used for testing the 
performance of the algorithm. In the first, we consider ./V freely moving particles 
in one spatial dimension. The second problem we investigate is the evolution of a 
Newtonian self-gravitating system in l-D [2, 3, 12, 19]. 

4.1. Free motion 

Consider TV particles all of the same mass m normalized to be iV _1 . Denote the 
position and the velocity of the jth particle as respectively Xj and Vj . The particles 
move freely, hence neither interact with each other, nor do they feel any external 
force. Their acceleration is thus identically equal to zero. In the plane (x,t), each 
particle moves on a straight line, the slope of which is its velocity. The intersection 
of two lines represents a crossing or a collision in physical space. Each time two 
particles encounter, i.e. Xj = Xj + \, they cross each other. Equivalcntly, collisions 
can be thought of as elastic, since one-dimensional motion of indistinguishable 
particles is equivalent to a system of one-dimensional impenetrable mass points, 
which bounce elastically off one another (see Fig. 4). In the latter case, colliding 
particles exchange their velocities : in the (x, t) plane the trajectory of a single 
particle is then a broken line. 

We remark that the system of non-interacting particles is not trivial in Eulerian 
coordinates, and has been used as a model of structure formation in the early Uni- 
verse, the so-called Zeldovich approximation. Furthermore, as long as the solution 
stays single-stream, i.e. before any collision has occurred, the Zeldovich approxi- 
mation is in some sense exact in one dimension [20, 4]. After a first collision, when 
in the continuum limit a caustic has been formed and the solution is no longer 
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FIG. 5. Phase space portraits of the free motion starting from initial velocity on double 
sine wave. The first caustic is formed at time t = (4-7r) _1 (dotted line). After caustic formation, 
velocity is a multi-valued function of position and in that region, a Zeldovich pancake, or blini, 
carries an increasing fraction of the mass. 

single-stream, the dynamics of a self-gravitating system can nevertheless be proven 
to stay close to the Zeldovich approximation for short enough times [14]. 

Fig. 5 shows a phase-space portrait of this dynamics with particles initially uni- 
formly distributed in space, and velocity a smooth function of position. Fig. 6 shows 
the same dynamics in the (x, t) plane, and clearly displays structure formation in 
Eulerian coordinates (caustics). 

4.2. Self-gravitating systems 

Consider now a one-dimensional (classical) Newtonian self-gravitating system of 
N particles, again all of the same mass m = iV _1 . The Hamiltonian is : 



where Xj is the position of particle j, pj = mvj is the momentum conjugate to Xj 
and G is the gravitational constant [12, 19]. We choose as unit of length the 
spatial interval in which the particles are initially contained. The initial density po 
is thus equal to one. The natural choice of time scale is the inverse of the Jeans 
frequency u)j = (AirGpo) 1 ^ 2 . With our choice of length this implies that we take 
equal to one. 

Inbetween two collisions, the acceleration of each particle is constant, and is pro- 
portional to the difference of number of particles on its right and on its left. In the 
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FIG. 6. Dynamics in the x, t plane for free motion with the same initial conditions as Fig. 5. 
The two caustics are clearly visible. For long times, all particles will fly away to infinity. 



(x, t) plane, the path of a particle between collisions thus follows a parabola. If the 
particles are assumed to pass through each other freely, then in a collision, veloci- 
ties are unchanged but accelerations are exchanged. If, on the other hand, particles 
are assumed to scatter elastically, then in a collision, velocities are exchanged but 
accelerations are unchanged. Switching from one interpretation to the other only 
involves keeping track of the permutation relating the current rank of the particles 
to their initial rank. This can be realized algorithmically by bookkeeping at each 
collision two indexing arrays, inverse of each other in a way similar to the HP[»] 
and PH[»] arrays, but this time holding the relations between the initial and the 
current particle rank. 

Evolving the system thus involves basically two operations: finding the next 
collision time of a pair of particles and moving these to their common collision time. 
Although seemingly simple, these two operations contain lots of numerical traps. 
We remark that the system is chaotic, i.e. dynamically unstable, and amplifies 
small perturbations. It is thus especially important to keep numerical errors small. 
First, finding the collision time implies finding the positive root of the quadratic 
equation 



-Sa(t c - tf + 5v(t c -t) + Sx = 



(3) 
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where Sa, Sv and Sx are respectively the differences of accelerations, velocities and 
positions between the right and left particles at the common time t (so that Sx > 
and Sa < 0), and t c is their predicted next collision time. That both roots are 
real and that there can be only one positive root follows from the fact that 5aSx is 
negative. As is well known, solving the quadratic equation by the classical formula : 



tc-t: 



—Sv ± y/Sv 2 — 25aSx 
Sa 



(4) 



will lead to severe cancellation when Sv is negative, Sx is small and the positive 
root is sought, that is exactly the case that corresponds physically to nearly free 
motion. Hence wc only use (4) when Sv is positive and with the — determination 
of the radical. When Sv is negative, the classical formula can be re-arranged to : 



tc-t = 



2Sx 



—Sv + ^fSv 2 — 2SaSx 



(5) 



which is stable in this case. Note that both formulas naturally tend to their proper 
limit when Sa — > (the free motion case). In the self-gravitating case and with our 
choice of units, Sa has the constant value —TV -1 for neighboring particles. 
Second, moving the particles forward in time should be written as 



x j(t) — x j + 



(t-Tj) 



(6) 



that, if Oj/2 is precomputed, involves two nested fused multiply-add (madd) opera- 
tions, which on many modern microprocessor are executed in a single clock cycle 
and with a single roundoff error, leading to a saving of three floating-point opera- 
tions over the five needed by the classical expression Xj + Vj(t — Tj) + aj/2(t — Tj) 2 . 
Expression (6) can also be recognized as Horner's rule, which, even on machines 
lacking a madd instruction, saves one multiplication. 

If the positions of two colliding particles are updated according to equation (6), 
they might end up with a slightly different final position or, even worse, the left 
particle could overtake the right one because of roundoff. To prevent this we used 
a common symmetric formula for computing the final position of the pair, 



*£c(^c) 



a, + a j+1 \ f Tj+i - t. 



+ 



x j ~f x j+i 
2 

Vj - v j+1 



+ ( a 3 - a 3+l) ( tc - 

vj + cj , + i 0j ' + .° J ' +1 n/, 



T 3 + l + T 3 

2 

T 3 + l + T 3 



T 3 + l ~ T j\ 1 



2 j 2 

Tj+l+Tj\ 1 



(7) 



which is physically equivalent to moving the center of mass of the two particles. 
Although seemingly complicated, this form is full of repeated subexpressions and 
can thus be evaluated quite efficiently, especially as it again only involves madd oper- 
ations. This expression also gives the benefit of preserving exactly any symmetries 
that might be present in the initial velocity profile because positions, velocities and 
accelerations are combined before any multiplication is done. 
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FIG. 7. Phase space portrait of a self-gravitating system. The initial velocity profile is the 
same as in Fig. 7, i.e. a double sine wave. Although the initial evolution is similar to the free 
motion case, after a few Jeans times, the system develops spiral structures in phase space, due to 
the gravitational attraction preventing the particles from running away. 

Fig. 7 shows phase space portraits of this self-gravitating dynamics with par- 
ticles initially uniformly distributed in space, and velocity a smooth function of 
position. After caustic formation, the system develops a spiral structure in phase- 
space. Fig. 8 shows the same dynamics in the (x, t) plane. As in free streaming 
motion, caustics can be observed. In addition, we see that gravitation stops the par- 
ticles from moving apart as easily after the caustics have been formed and this leads 
to a more concentrated mass agglomeration. Nevertheless, it is worth stressing the 
qualitative agreement between Figs. 6 and 8 for the initial stage of the evolution : 
this is an indirect confirmation of the validity of the Zeldovich approximation. 

An event-driven scheme for the simulation of one-dimensional self gravitating 
systems was first introduced by Eldridge and Feix [3], and has later been further 
developed in the literature [18]. This and all other published schemes however 
always involve at least O(N) operations per collisions, either to find the minimum 
collision time, or to update the particle states after each collision. In our algorithm, 
these two operations need respectively C(log N) and 0(1) operations. 

The theoretical predictions of the performance of the algorithm is confirmed 
by Fig. 9, which shows CPU time per collision vs. number of particles in semi- 
logarithmic scale. The linear dependence on logA^ is clear in the range 300-10000. 
However, there is a significant constant contribution coming from the floating point 
operations needed to update the particles states, which actually still take the lion's 
share of the CPU time for 10000 particles. In the data of Fig. 9 (see caption), 
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FIG. 8. Dynamics in the x, t plane for a self-gravitating system. For large times, all particles 
stay trapped in a finite region of space, with their trajectories intermixing each other until the 
system becomes completely chaotic. 

we reached speeds in excess of 4 x 10 5 collisions/s on an inexpensive Linux PC. 
On a DEC ALPHA-based workstation at 600 MHz, we got speeds of around 1.3 x 
10 6 collisions/s, for TV equal to 1000. Hence the algorithm presented here allows 
the study of fairly large systems for long times on ubiquitous hardware. 

The preceding discussion has made clear that a limiting factor for simulating 
a system up to a given time T cnd is the number of collisions Z that have to be 
performed in that time interval. If we want to approach the continuum limit, that is 
large As, it is especially important to know how Z scales with N. If N mass points 
with total mass equal to one, initially located in an interval of length one, provide 
a discretization of a given velocity function, then, as long as the discretization is 
not felt, one expects that the average time between successive collisions of a given 
particle goes like TV -1 . The total collision rate hence grows as TV 2 . Likewise, if 
we consider discretization of a statistically stationary state, the distance between 
particles would scale as TV -1 while the velocity would be independent of N. Hence, 
also in this case, the collision rate would be proportional to TV 2 . Fig. 10 shows 
the number of collisions vs. time in unit of inverse Jeans' frequency, in double 
logarithmic coordinates, for different number of particles. The curves corresponding 
to different N s are parallel to each other, the separations being the squares of the 
ratio of successive values of N. Fig. 10 hence suggests that the proposed scaling 
holds true for different discretizations of the same initial conditions for all regimes. 
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FIG. 9. CPU time per collision measured as the output of the UNIX library function times () , 
divided by the number of collisions, after 30 Jeans time, for systems of various size iV. The code 
was compiled by gcc in the Linux kernel 2.2 and ran on an Intel Pentium II 450 MHz processor. 
Data points on the left are not very reliable because of the limited resolution of times (). 



Fig. 11 explores the late time regime, and shows the number of collisions vs. time 
in linear scale (the earlier time regimes shown in Fig. 10 are not visible in this 
representation). It is clear from this figure that the collision rate becomes constant 
for times much larger than the inverse Jeans' time. 



5. CONCLUSIONS 

We discussed the implementation of a fast heap-based event-driven scheme for in- 
tegrating numerically one dimensional systems of N interacting particles, provided 
the dynamics can be integrated between two successive collisions. The collision 
times are ordered on a heap, which reduces the complexity to O(logiV) opera- 
tions per collision. As a consequence, for large values of N, the present algorithm is 
faster than earlier algorithms in the litterature, which are 0{N). This opens up the 
perspective of improving the numerical estimates of the statistics of such systems. 

Commonly, a particle system is considered as a discrete approximation to a con- 
tinuum limit, e.g. self-gravitating particles to the Vlasov-Poisson system of coupled 
partial differential equations. If so, the main limitation of the present scheme is Z , 
the number of collisions needed to be performed in a given (intrinsic) time T en( j. We 
presented theoretical arguments and numerical simulations showing that Z gener- 
ally grows as ./V 2 . Hence the total computational cost of reaching a time T C nd 
is 0(g(T en d)N 2 logjV), where g is a function depending also on the initial condition, 
but which becomes proportional to T en d for sufficiently large T en d, i.e. g ~ /T en d 
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FIG. 10. 



Number of collisions vs. time in units of u. for different number of particles. 



We can clearly distinguish one early regime, before many collisions have occurred, one late regime 
(after about WujJ ) where the collision rate becomes constant, and one intermediate regime. For 
all times, the number of collisions goes as N 2 . 



with / independent of TV and 0(1) if v is 0(1) (sec Fig. 11). We believe that one 
cannot do better for one-dimensional particle systems than the algorithm presented 
here, without introducing further approximations. 

For many applications, such as self-gravitating particles, the system is inher- 
ently chaotic, and small errors are amplified by the dynamics. Given some initial 
accuracy e, the best accuracy that can be got at time T cn( j is e e ATond , for some pos- 
itive A. Given this, we might however estimate the relative errors of discretization 
of a given PDE to a particle system, and the roundoff errors from the collisions, 
in order to find the "best" value of N leading to the smallest total error. The 
first error is essentially TV -1 , while the second is r)(f 'T en dZ /N) 1 / 2 , where r\ is the 
machine precision and fT eD dZ/N is the average number of collisions experienced 
by a single particle in the interval [0,T en( j], and the roundoff errors coming from 
each collision being uncorrelatcd. The smallest total error is hence found for N of 
the order of rT 2/3 (/ ^end) -173 and would thus scale as r] 2/3 (f T en d) 1/3 e XT ™ d . We 
see from these expressions that, as expected, the optimal value of N decreases 
with T on( j, and that it is useless to use more than w 10 4 particles in single precision 
and more than 10 10 in double precision. While the second number of particles is 
out of reach of current computers, the first one can be handled by our code (about 
25 CPU seconds per Jeans time for N = 10000). In double precision, we should 
always use values of N as large as possible as the limiting value is very large and 
decreases only very slowly with T en ^. 
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Reduced Time, t CO, 

FIG. 11. A different representation of Fig. 10, for only one value of N, but for much 
longer times and in linear coordinates. The collision rate has clearly become constant at all times 
displayed in this figure. 

It should be added that in both applications presented here, the particle view is 
at least as fundamental as the PDE one. The discretization gives coarse-grained 
noise, compared to the PDE, but this is a real physical effect e.g. in stars clusters [5, 
11]. Furthermore, for investigation of the statistically steady state in such system, 
the relevant errors on global quantities are not growing with time in this regime, 
assuming the validity of the shadowing lemma of dynamical systems theory. 

In the paper, we presented free motion and self-gravitating systems as possible 
applications of our algorithm. Nevertheless, it is worth to stress that the algorithm 
is more general and, for example, can also be applied to models of the motion of 
matter in an expanding Universe [1, 15]. 
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